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© A synchronising scheme for processing "locked" 
memory access requests from various computer 
system units is provided by a System Control Unit 
(SCU) (14). The SCU is provided with a directory of 
lock bits in such a way that each memory address 
associated with a memory access request is 
mapped to a corresponding location in the lock 
directory. Incoming requests for a given memory 
location are processed by interrogating the corre- 
sponding lock bit in the lock directory. If the lock bit 
is not set. the request is granted. The lock bit is 
subsequently set and maintained in that state until 
the unit requesting the lock has completed its mem- 
ory access operation and sends an "unlock" re- 
quest. If the interrogated lock bit is found to be set. 
the lock request is denied and the requesting port is 
notified of the denial and requested to try again. 
Fairness is incorporated In the processing of denied 
lock requests by means of a reserve list onto which 



denied requests are sequentially positioned on a 
first-come-first-served basis. 
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Synchronising and Processing of memory access operations 



This invention relates generally to multiproces- 
sor computer systems. More particularly, this in- 
vention relates to the synchronising and processing 
of memory access operations in such a computer 
system where multiple processor units and input- 
output units share data structures in the common 
system memory. 

According to a first aspect of the present in- 
vention there is provided in a multiprocessor sys- 
tem of the kind having a system control unit (SCU) 
for operating a plurality of central processor units 
(CPUs) in a parallel fashion in combination with at 
least one input/out (I/O) unit, wherein each of said 
CPUs and I/O units is ported into the SCU. the 
SCU being operative to controliably allow said 
CPUs and said I/O units to access addressed seg- 
ments of a system memory, a method of process- 
ing, in an efficient and fair manner, locked memory 
access requests received at the SCU ports from 
one or more of said CPUs and said I/O units, said 
method comprising: storing within said SCU a di- 
rectory of lock bits, each of said lock bits cor- 
responding to a predefined segment of memory 
locations, in said system memory, the status of 
each of said lock bits, asserted or not asserted, 
representing whether or not the corresponding 
memory segment is locked; ascertaining, for each 
lock request being processed, whether or not the 
lock bit corresponding to the memory segment 
containing the memory location being addressed 
by said lock request is set; and either granting said 
lock request to said requesting unit if said cor- 
responding lock bit is not set. or denying said lock 
request to said requesting port if said correspond- 
ing lock bit is found to be set. 

Preferably, each of said CPUs includes an as- 
sociated cache memory capable of storing a block 
of memory containing a fixed number of memory 
locations, and wherein each of said lock bits in said 
SCU lock directory corresponds to the block of 
memory defined by said CPU caches, whereby all 
memory addressed which are associated with a 
memory access request and which fall within a 
cache block are mapped to the same lock bit in 
said SCU directory. 

The lock requests that are denied JbecaLi^se the 
memory location addressed by the said request 
corresponds to a locked cache block maybe further 
arbitrated by the SCU, as follows: defining a re- 
serve list adapted to store a set number of denied 
requests in a sequential order; storing a denied 
request on said reserve list if (1 ) the list is not full 
with prior denied and reserved requests and (2) the 
denied request has not already been placed on the 
list; and granting the oldest denied lock request 



existing on said reserve list, when said correspond- 
ing locked cache block, which originally resulted in 
the denial of said oldest lock request, becomes 
unlocked. 

5 According to a second aspect of the present 

invention there is provided in a multiprocessor sys- 
tem of the kind having a system control unit (SCU) 
for operating a plurality of central processor units 
(CPUs) in a parallel fashion in combination with at 

70 least one input/output (I/O) unit, wherein each of 
said CPUs and I/O controliably allow said CPUs 
and said I/O units to access addressed segments 
of a system memory, the improvement comprising: 
means for storing within said SCU a directory of 

75 lock bits, each of said lock bits corresponding to a 
predefined segment of memory locations in said 
system memory, the status of each of said lock 
bits representing whether or not the corresponding 
memory segment is locked; or not; means for 

20 determining, for each lock request being pro- 
cessed, whether or not the lock bit corresponding 
to the memory segment containing the memory 
location being addressed by said lock request is 
set; and means for granting said lock request to 

25 said requesting unit if said corresponding lock bit is 
not set; and means for denying said lock request to 
said requesting port if said corresponding lock bit 
is found to be set. 

Preferably, each of said CPUs includes an as- 

30 sociated cache memory capable of storing a block 
of memory containing a fixed number of memory 
locations, and wherein each of the lock bits in the 
said SCU lock directory corresponds to the block 
of memory defined by said CPU caches, whereby 

35 all memory addresses which are associated with a 
memory access request and which fall within a 
cache block are mapped to the same lock bit in 
said SCU directory. 

There may be arbitrating means comprising: 

40 means for defining a reserve list adapted to store a 
set number of denied requests in sequential order; 
means for storing a denied request on said reserve 
list if (1) the list is not full with prior denied and 
reserved requests and (2) the denied request has 

45 not already been placed on the list; and 

means for granting the oldest denied lock request 
existing on said reserve list, when said correspond- 
ing locked cache block, which originally resulted in 
the denial of said oldest lock request, becomes 

so unlocked. 

In the specification and claims the reference to 
a bit being "set" or "unset" may be a reference to 
that bit respectively having the logical values of 1 
and 0: or, alternatively, respectively the logical val- 
ues 0 and 1 . 
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In high-performance multiprocessor computer 
systems, a plurality of central processor units 
(CPUs) are typically operated in a parallel fashion 
in conjunction with other system units, including 
several input/output (I/O) units, by providing all 
system units with relatively autonomous accessibil- 
ity to a common system memory (the primary or 
main nnemory). These system units are capable of 
both reading from and writing to locations within 
the main memory. Because the system units share 
data structures in main memory and because 
memory access requests originating from these 
units are asynchronous in nature, memory access 
conflicts arise when access to identical locations in 
memory is requested by different system units at 
the same time. Accordingly, it is imperative to 
control access to main memory in such a manner 
that memory access requests are sequenced cor- 
rectly so as to avoid access conflicts. Memory 
interlocks are commonly used for this purpose. 

Memory interlocks are mechanisms implement- 
ed in hardware or software to coordinate the activ- 
ity of two or more memory access operations 
(initiated by a system unit, such as a CPU or I/O 
unit) and generally insure that one process has 
reached a suitable state such that the other may 
proceed. Because the conflicting access operations 
use common mernory resources, interlocks will se- 
quence the requests to guarantee that one request 
is honored at a time, and perhaps that sonne dis- 
cipline, such as first-come-first-served, is observed. 

In particular, when a system unit, such as a 
CPU, performs an interlocked read to memory, no 
other unit is allowed interlocked access to the 
same location until the first processor does a write 
unlock to release the lock. This type of sequencing 
allows controlled sharing of data structures without 
giving rise to "race" conditions which would other- 
wise occur if interdependent memory access oper- 
ations were to be executed without proper sequen- 
cing. Interlock mechanisms are physically imple- 
mented within the memory system by designating 
interlock bits associated with memory locations that 
need to be locked. Thus, when a CPU needs to 
read or write to a memory location while precluding 
the possibility of having the contents of that loca- 
tion affected by being accessed by another CPU or 
I/O unit, the CPU performs a "locked" access to 
the desired location. This access causes a cor- 
responding lock bit to be set. Any other memory 
access operation addressed to the "locked" loca- 
tion involves the initial testing of the corresponding 
lock bit If the bit is found to be set, i.e.. the 
requested memory location has been "locked", the 
access request is denied. Once the access request 
which initiated a lock has been completed, the lock 
bit is reset or "unlocked" and a subsequent mem- 
ory access request addressing the previously 



locked memory location is allowed to go through. 

Since the use of interlock bits is restricted to 
synchronizing problems only, which occur infre- 
quently and only when locked access is requested 
s to the same memory locations, it becomes highly 
impractical to provide a single lock bit for the whole 
memory system. Under this arrangement, when a 
CPU makes a locked read or write to even a single 
memory location, the complete system memory is 
70 locked up to subsequent lock requests. Non-locked 
reads and writes, however, proceed as usual even 
when the lock bit is set. The single lock bit sys- 
tems performed adequately since the number of 
interlock requests was relatively low so that request 
75 synchronization was infrequently called for. Under 
such conditions, a considerably less dense alloca- 
tion of lock bits was found to be satisfactory. 

However, as the operational speed for CPUs 
increased, as larger memories became typical, and 
20 as multiprocessing, using shared memory re- 
sources, and parallel processing by problem de- 
composition, became increasingly popular, the 
average number of interlock requests increased 
dramatically to the point where it is essential to 
25 have a locking granularity which is considerably 
less than that of the overall memory system. Single 
lock bit systems are no longer capable of efficiently 
sequencing memory access requests which fre- 
quently are interlocked in nature. 
30 A further consideration in the use of interlock 

bits is the physical location in which they are to be 
stored. It may appear to be convenient to allocate a 
section of main memory for storing the interlock 
bits since all memory access requests ultimately 
35 do reach the main memory. There are. however, a 
number of disadvantages inherent to such a 
scheme. By storing the interlock bits in the main 
memory, a substantial portion of the main memory 
becomes dedicated to this function only and can- 
40 not be used for other important purposes. 

More significantly, checking the interlock bits 
when they are stored in the main memory would 
be a very slow process. Main memory, because of 
its great size, is ordinarily constructed from low- 
45 speed, low-cost memory components so as to re- 
duce the overall computer system cost. In order to 
prevent the main memory from slowing down the 
overall operation of the computer system, high- 
speed caches are employed in each of the CPUs. 
50 These caches contain selected portions of the main 
memory so that the CPU can normally access the 
desired data without accessing the main memory. 
If the interlock bits are stored in the main memory, 
then each CPU request for data must be preceded 
55 by a main memory access to interrogate the inter- 
lock bit. irrespective of whether the data is present 
in the high-speed cache or in the main memory. 
An alternative is to store the interlock bits in 
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the CPU caches in an attempt to solve the speed 
problems inherent in storing the interlock bits in the 
main memory. However, this approach is equally 
problematic primarily because memory access 
functions that originate in the units are not stored in 
the CPU caches and the I/O units, which are rela- 
tively slow and have no caches of their own, ac- 
count for a large percentage of ail locked memory 
access requests. Thus, it would be impractical to 
employ the high-speed CPU caches for storing 
interlock bits; problems associated with potential 
main memory access conflicts between CPUs and 
I/O units would still remain unsolved. 

The present invention is directed to overcom- 
ing one or more of the above-references problems. 

Briefly, in accordance with this invention, an 
efficient synchronizing scheme is provided for pro- 
cessing "locked" memory access requests from 
various system units in a multi-processing system. 
The processing is handled in such a manner that 
requests are processed according to a locking 
granularity substantially less than that of the overall 
system memory while, at the same time, ensuring 
that the multiple requests are granted fair and 
quick access to desired locations in shared mem- 
ory. 

According to a feature of this invention, the 
monitoring and control of all lock requests is han- 
dled by the SCU. This is in view of the fact that all 
transactions to and from memory are processed 
through the SCU in a multi-processing system of 
the type where the SCU controls the parallel opera- 
tion of a plurality of CPUs and I/O units relative to 
the common main memory of this system. Locking 
granularity is defined at the level of individual 
cache blocks for the system CPUs and, accord- 
ingly, represents a granularity which is orders of 
magnitude less than that of the complete system 
memory, as used in conventional systems. The 
arrangement is advantageous in that the cache 
block also represents the unit of memory allocation 
in the illustrative system and particularly because 
the SCU. by virtue of its involvement with all trans- 
fers of memory blocks, is in a position to conve- 
niently check if a block is locked or not. 

Furthermore, the crippling disadvantages dis- 
cussed above that would accompany storage of 
lock bits either in the main memory or in the cache 
are avoided by storing a plurality of lock bits within 
the SCU itself. More specifically, the SCU is pro- 
vided with a lock directory defined so that each 
memory address associated with a memory access 
request being processed and synchronized is 
mapped to a location in the lock directory in such a 
way that addresses in the same block of memory 
are mapped to the same location. 

The synchronizing scheme of this Invention 
operates by interlocking, as a unit, blocks of mem- 



ory containing designated memory locations. The 
SCU, accordingly, for purposes of lock manage- 
ment, need not be concerned with memory ad- 
dressed of a granularity finer than that of the cache 

5 block. In essence, an incoming memory access 
request which requires a lock on a given memory 
location is processed by first interrogating the cor- 
responding lock bit in the SCU lock. directory. This 
is performed by using the memory address asso- 

10 ciated with the request as an index into the lock 
directory defined in the SCU. If the lock bit is not 
set. the lock request is granted. The lock bit is 
subsequently set and maintained in that state until 
the memory access operation accompanying the 

75 lock request is completed and generates an 
"unlock" request indicating that the memory ad- 
dress at issue, and, hence, the cache block con- 
taining the address, may then be made available 
for subsequent lock requests. The memory address 

20 to which an unlock request is directed is used as 
an index into the SCU lock directory, and the 
corresponding lock bit is cleared. However, if the 
lock bit corresponding to the memory address on 
which an interlock is requested is found to be set 

25 on interrogation, the lock request is denied and the 
requesting system unit is notified of the denial and 
requested to try back again. Because denied lock 
requests are returned to the requesting system unit 
and are treated as having been otherwise pro- 

30 cessed by the synchronizing scheme, a requesting 
unit is not forced to wait in a deadlocked state for a 
particular cache block to be unlocked before the 
unit's memory access request is honored. 

According to another aspect of this invention. 

35 fairness in the processing of memory access re- 
quests is insured by the provision of a reserve list 
onto which a lock request may be placed after it 
has been denied. Any time a lock request from a 
particular port is denied, it is positioned on the 

40 reserve list provided it has not already been so 
placed, and provided there is an available slot on 
the reserve list. When the lock bit corresponding to 
the requested cache block is subsequently un- 
locked, the request which is found to be at the top 

4S of the reserve list at the time is granted, while the 
remaining requests in the list are all shifted upward 
in the list by one slot. This guarantees a predefined 
degree of fairness in the processing of denied lock 
requests and the extent to which fairness is ex- 

50 tended corresponds of course to the depth of the 
reserve list. 

The above arrangement works efficiently for 
both CPUs and I/O units because lock bits are kept 
in a separate directory in the SCU. It Is not neces- 
55 sary that the memory block to which an access 
request is addressed be located in any particular 
cache and. thus, lock requests from I/O units, 
which typically have no indigenous cache of their 
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own, can also be synchronized without difficulty- 
Additional objects and advantages of the inven- 
tion will beconne apparent upon reading the follow- 
ing detailed description and upon reference to the 
drawings in which: 

FIG. 1 is a block diagram illustrating a multi- 
processor computer system using a system control 
unit (SCU) for operating a plurality of central pro- 
cessor v units (CPUs) and (I/O) units in which the 
lock request processing and synchronizing scheme 
of this Invention may be effectively used; 

FIG. 2 is a diagram illustrating the manner in 
which the cache status or tag directory within the 
SCU is used to define the lock bit and the reserve 
list; 

FIG. 3 is a flowchart illustrating the basic 
sequential procedure on which the synchronizing 
scheme of this invention is used; 

FIG. 4 is a more detailed flowchart illustrat- 
ing the sequential procedure according to which 
lock requests are processed by either being grant- 
ed or denied on the basis of the status of the lock 
directory, and subsequently placed on reserve; 

FIG. 5 is a flowchart illustrating the manner 
in which the reserve list is maintained for use with 
the procedure outlined in FIG. 3; and 

FIGS. 6A-B are flowcharts of additional rou- 
tines that may be used in the sequential procedure 
of FIG. 5. 

Turning now to the drawings and referring in 
particular to FIG. 1. there is shown a block diagram 
of a multi-processing system 10 which uses a 
plurality of central processing units (CPUs) 12 and 
is adapted to permit simultaneous, i.e., parallel, 
operation of the system CPUs by allowing them to 
share a common memory 16 for the system. The 
main memory 16 itself typically comprises a plural- 
ity of memory units 16A and I6B. A system control 
unit (SCU) 14 links the CPUs 12 to the main 
memory 16 and to an input/output (I/O) controller 
18. The I/O controller allows the processing system 
in general and the CPUs in particular to commu- 
nicate with the external world through appropriate 
I/O interfaces 20 and associated I/O units 2GA for 
the system. The SCU 14 may also link the various 
system modules to a service processor/console 
unit (SPU) 22 which regulates traditional console 
functions including status determination and control 
of the overall operation of the processing system. 

In the multi-processing system of FIG. 1, effi- 
cient communication between system units linked 
through the SCU 14 and the main memory 16 and. 
more particularly, between each system CPU 12 
and the individually addressable segments com- 
prising each nnemory unit 16A. I6B is handled 
through dedicated interface means 30. The specific 
configuration of the main memory and the particu- 
lar manner in which the SCU is interfaced to the 



memory is not important to the present invention 
and accordingly will not be discussed in detail 
herein. Reference is hereby made to the above 
referenced co-pending Gagliardo et al. U.S. patent 

5 application Serial No. 306.326 filed 2|3|89 title 
"Method And Means For Interfacing A System 
Control Unit For A Multi-Processor System With 
The System Main Memory", also owned by the 
assignee of the present invention, incorporated 

10 herein by reference for details on preferred inter- 
face means. For purposes of describing the 
present invention, it suffices to state that each 
memory unit 16A, I6B of the main memory 16 is 
preferably split between two memory ports on the 

75 SCU with each port being linked to two individually 
addressable segments and all segments being in- 
terleaved on block boundaries, as described in 
detail in the aforementioned co-pending applica- 
tion. 

20 Each system unit, such as a CPU or an I/O 
unit, is ported into the SCU 14 through a discrete 
port and all communication requests to and from 
memory and. more specifically, access requests 
between memory and the system units, are lodged 

25 at the corresponding port on the SCU. The SCU 14 
functions to keep system units active while avoid- 
ing inter-unit conflicts by handling requests for 
communications between the system unit and the 
system memory that are received at various ports 

30 on the SCU. Because the various CPUs and I/O 
units are operated in a parallel fashion within the 
multi-processing system, a plurality of communica- 
tion requests are routinely received at the SCU. In 
addition, a number of such requests may typically 

35 require access to the same system resources in 
order to honor the requests by executing the com- 
mands associated herewith. 

It is, accordingly, an important function of the 
SCU to process requests received at its ports from 

40 the system units in a fashion that utilizes the sys- 
tem resources in the most efficient manner and, in 
addition, treats each arriving system request in a 
fair manner by processing the request within a 
reasonable period of time. 

45 This function may be performed by use of 

some form of arbitration technique for deciding the 
selection and order of execution of incoming re- 
quests. An exemplary arbitration technique particu- 
larly suited for use with high-performance mui- 

50 tiprocessor systems is disclosed in the above-re- 
ferenced co-pending Flynn et al. U.S. Patent Ap- 
plication Serial No. 306.843, filed 2|3|a9. titled 
method And Means For Arbitrating Communication 
Requests Using A System Control Unit In A Multi- 

55 Processor System, also owned by the assignee of 
the present invention, and incorporated herein by 
reference for details on preferred arbitration means. 
It will be obvious that the communication re- 



5 



3NSDOCID: <EP 0381325A2_I_> 



EP 0 381 325 A2 



10 



quests received and arbitrated by the SOU include 
requests from system units, including CPUs and 
I/O units, to access main memory for read/write 
operations. Further, these memory access requests 
may include requests for "locked" access to mem- 
ory. The SCU initially proceeds with arbitrating 
such requests, preferably in accordance with the 
above-referenced arbitration scheme, to determine 
at any given time, the particular request to be 
processed which has all resources required by it 
available. Assuming that the selected request is a 
locked memory access request from a system unit, 
it then becomes the responsibility of the SCU to 
determine whether or not the requested lock on the 
memory location being addressed by the selected 
request should be granted. In other words, at this 
stage, the SCU needs to synchronize the process 
lock requests generated by two or more of the 
system units in connection with various cooperating 
processes, in such a way as to achieve the desired 
objectives of fairness, deadlock prevention, and 
optimum processing speed. 

In accordance with this invention, the process- 
ing and synchronizing of lock requests for access 
to memory is handled by the SCU itself. Synchro- 
nizing is performed on the basis of a locking 
granularity which is substantially less than that of 
the overall memory system. Since the SCU essen- 
tially regulates the interfacing of communication 
requests between system units and the system 
memory, it is advantageous to use the SCU for 
monitoring and controlling all lock requests. Fur- 
ther, in a system of the type illustrated in FIG. 1 
locking granularity is defined at the level of individ- 
ual cache blocks for the system CPUs. Since the 
cache block represents the unit of memory alloca- 
tion in the multi-processing system described 
above, defining locking granularity at the cache 
block level conveniently allows the SCU to check 
the locked or unlock status of a memory block 
without undue delay. 

According to another important aspect of this 
invention, the SCU is provided with a lock directory 
comprising a plurality of lock bits which are defined 
in such a way that memory addresses associated 
with memory access requests being processed are 
mapped to a particular location in the lock direc- 
tory. Further, the mapping is done in such a man- 
ner that addresses defined within the same block 
of memory are mapped to identical locations within 
the lock directory. The lock directory is defined as 
an integral part of the central "Tag" directory used 
within the multi-processing systems to keep track 
of the status of caches within individual processors. 
Such a centra! directory of tag storage includes a 
location corresponding to every block of memory 
stored within a CPU cache. Each such location 
includes bits which define the nature (valid, invalid, 



read, write) of data stored in the block. Defining the 
lock directory as being part of the tag directory 
avoids the need for additional storage which would 
otherwise be required to maintain the lock direc- 

5 tory. Instead, the RAMs used for defining the cache 
tag directory can also be used to define and main- 
tain the lock directory. More specifically, a single 
lock bit is provided for each entry in the SCU tag 
directory and a list of reserve bits is also imple- 

10 mented therein and maintained on the basis of 
sequential reserve of denied lock requests up to 
the maximum number of reserve bits defined on 
the list, as will be described in detail below. 

In the case of lock requests, the "low" seg- 

75 ment of the memory access address accompany- 
ing a lock request is used as an index into the lock 
directory to generate an output signal indicating 
whether or not the address location is locked. Each 
lock bit location corresponds directly to the tag 

20 referencing the same block within a CPU cache. 
Such a cache stores a segment of main memory 
and by using only the "low" segment of the mem- 
ory access address, blocks of memory which are 
far enough apart in the main memory share the 

25 same location in the lock directory. For instance, in 
the illustrative embodiment of FIG. 1. a cache block 
is defined to be 64-bytes wide. Accordingly, if a 
thousand lock bits are used to define the main 
memory of the system, blocks which are suffi- 

30 ciently apart, i.e., which are apart by a factor of 64 
X 1000 = 64K bytes share the same location within 
the tag directory. Hence, two memory blocks which 
have the identical "low" address segments will 
show up as being blocked even if only one is 

35 actually locked. This arrangement has been found 
to provide an efficient compromise between the 
extreme choices of defining either a single lock bit 
for the whole memory or a single lock bit for every 
memory location. 

40 An illustrative arrangement for defining the lock 

directory within the tag directory of the SCU is 
shown in FIG. 2 which illustrates the manner in 
which the representative structure of a global tag 
directory for maintaining CPU cache status is 

45 adapted to have defined therein designated seg- 
ments for storing the lock bits as well as the 
reserve bits which define the reserve list Thus, all 
lock bits are stored within the same RAMs where 
the status of the CPU caches is normally stored. 

50 As shown in FIG. 2, the global tag structure 100 
includes four (4) separate RAM structures des- 
ignated as RAM 0. RAM 1. RAM 2. and RAM 3 
which correspond to the caches in the correspond- 
ing CPUs CPU 0. CPU 1. CPU 2. and CPU 3. Each 

55 RAM is dedicated to one CPU and. according to a 
preferred implementation, has a capacity of four 
kilobytes (4K) so that four separate 1-K sections 
may be defined. The first section 102 holds the Set 
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0 status for the CPU while the second section 104 
holds the Set 1 status. According to a preferred 
embodiment of this invention, the third section 106 
is used to define the locI< bits. The fourth section 
108 is left unused and can be used to advantage if 
the locking granularity is required to be further 
reduced by the addition of more lock bits. 

Write-Enable (W.E.) lines 100. 112, 114, and 
lie^are provided for enabling the writing of tags 
corresponding to the four RAM structures RAM 0-3. 
Memory addresses accompanying access requests 
are directed through a multiplexer arrangement 118 
which has its "select" input enabled by the tag 
index 122 to pick out the memory address for 
which the cache status is required, A tag "select" 
signal 122 is used to pick between the cache 
status for Sets 0 and 1. respectively, of a particular 
CPU. In FIG. 2. only one set of RAMs is shown for 
each RAM structure. In actuality, a plurality of 
RAMs are typically provided for each RAM struc- 
ture. In the illustrative embodiment of FIG. 1, for 
instance, six RAMs are provided for each RAM 
group. A 3-bit tag "status" input 124 is also pro- 
vided to each of the RAM structures and the cache 
status is sequentially read out in consecutive read 
cycles as output data while the corresponding 
memory addresses are also generated at the out- 
put. These addresses pass through a comparison 
unit to determine if there is a match between the 
memory address corresponding to a memory ac- 
cess request and the contents of the corresponding 
cache block. 

It is significant that the global tag structure 
described above is particularly adapted to the 
maintenance and read-out of the status of cache 
blocks within individual CPUs. Typically, the cache 
status indicates whether a particular cache block 
contains data which is either Invalid, read, written 
partial, or written full. The use of segments of the 
same RAMs which are used to define the cache 
status tag directory for storing the lock bits is 
advantageous in that exactly the same hardware 
used for maintaining and updating the cache status 
of the CPUs can be used for maintaining and 
updating the lock directory. As a matter of fact, the 
global tag structure shown at FIG. 2 is identical to 
the tag structure otherwise used by the SCU for 
maintaining the status of individual CPU caches. 
Reference is hereby made to co-pending Arnold et 
al. U.S. Patent Application Serial No. 306,776 filed 
2|3|89 for "Improved Scheme For Insuring Data 
Consistency Between A Plurality Of Cache Memo- 
ries And The Main Memory In A Multi-Processor 
Computer System", also owned by the assigned of 
the present Invention, wherein details are provided 
as to the structural nature of the CPU caches and 
the manner in which the cache tags may be used 
to specify the status of individual CPU caches. The 



disclosure in the aforementioned application is in- 
corporated herein by reference. 

It should be noted in FIG. 2 that the lock bits 
are provided only in the first RAM structure and. 

5 accordingly, the lock directory includes a separate 
lock bit for each of the system units. The cor- 
responding bits in the remaining three RAM struc- 
tures RAM 1 . RAM 2, and RAM 3 are respectively 
designated as reserve bits R1 , R2. and R3 for each 

10 of the system units. The allocation of reserve bits 
and the manner in which they are updated as lock 
requests are denied will be described in detail 
below. 

Referring now to FIG. 3, there is shown an 

75 upper level flowchart defining the basic procedure 
involved in implementing the lock request process- 
ing and synchronizing scheme of this invention. As 
shown therein, the overall procedure 130 is initiated 
at step 131 where memory access requests are 

20 received and stored. At step 132. the stored re- 
quests are arbitrated in order to select a particular 
request for execution. Arbitration is performed on 
the basis of availability of resources required to 
execute a request while, at the same time, insuring 

25 that a request that has been waiting the longest for 
corresponding resources to become available is 
given the highest priority, as described in detail in 
the above-references Flynn et al. patent application 
entitled "Method And Means For Arbitrating Com- 

30 municatlon Requests Using A System Control Unit 
In A Multi-Processor System". Subsequently, at 
step 134. a determination is made, on the basis of 
the command field accompanying the memory ac- 
cess request, as to whether or not the request is a 

35 "lock" request. If the answer at step 134 is no. step 

135 is accessed where the request at issue is 
executed in the normal fashion and step 132 is 
accessed again to proceed with the selection of the 
subsequent request. 

40 However, if it is found at step 134 that the 

request is indeed of the lock type, a determination 
is made at step 1 36 as to whether or not the cache 
block corresponding to the memory access request 
has previously been locked by a request from 

45 another system unit or port. If the answer at step 

136 is yes, step 137 is accessed where the lock 
request is denied and the request is returned to the 
requesting port. 

Subsequently, at step 138. the denied request 
50 is placed on a reserve list prior to returning to step 
132 for arbitration of other outstanding requests. 
The specific procedure involved in developing a 
reserve list will be discussed below in detail. If the 
answer at step 136 is found to be in the negative, 
55 i.e., the corresponding cache block is not locked, 
the lock request is granted at step 139 and the 
addressed cache block is set to a locked status. 
Subsequently, at step 140 the granted memory 
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request is executed before returning to step 132 for 
a selection of another access request for execution. 

Referring now to FIG. 4. there is shown a lower 
level flow chart illustrating in detail the procedure 
involved in determining whether a lock request 
being processed is to be granted, denied, or 
placed on the reserve list. The detailed sequence 
150 is initiated at step 151 by the receipt of a lock 
request from the SCU. At step 152, bits R1 and R2 
of the reserve list are initialized by assigning values 
equal to Temp 1 , and Temp 2. respectively. In FIG. 
2. for instance, the reserve list is shown as having 
a depth of three (3) bits, as represented by the 
RAM areas marked as R1, R2, and R3 in RAM 
structures RAM 1, RAM 2. and RAM 3. respec- 
tively. The flowchart of FIG. 4 also represents the 
reservation of denied requests relative to 3-bit re- 
serve list. At step 153. a check is made to see if 
any of the lock bits corresponding to the system 
units (as represented by the lock bit in the RAM 
structure RAM 0 in FIG. 2) is set. If none of the 
lock bits is found to be set, step 154 is accessed 
where a check is made to see if any of the reserve 
bits (R1. R2. and R3 in FIG. 2) are set. If none of 
the reserve bits is found to have been set, the lock 
bit corresponding to the requesting port is set at 
step 155 and the lock request is granted at step 
156. If none of the reserve bits is found to have 
been set at step 154. step 155 is normally acces- 
sed where a determination is made as to whether 
or not the reserve bit corresponding to the request- 
ing port, i.e.. the port originating the lock request, 
is set. If the answer is no, a check is made at step 
1 58 to see if any of the reserve bits in the reserve 
list or queue are available. A negative answer at 
step 158 leads to the denial of the requested lock 
at step 159. If a reserve bit is in fact found to be 
available, step 160 is accessed where the reserve 
bit corresponding to the requesting port and having 
the lowest reserve priority (R1 ) is set. Subsequent- 
ly, step 161 is reached where a shift routine is 
employed for updating the reserve list, as will be 
explained below. If it is found at step 157 that the 
reserve bit for the requesting port has already been 
set. a check is made at step 162 to determine if 
the requesting port has been reserved with the 
highest available reserve priority. If the answer at 
step 162 is found to be negative, i.e., there exists 
at the time an outstanding lock request having a 
higher priority, the current lock request is denied at 
step 161. 

If the answer to step 153 is affirmative, i.e.. at 
least one of the lock bits is found to have been set. 
a check is made at step 163 to see if any of the 
reserve bits are available. If no reserve bits, are 
available, the lock request is denied at step 164. 
However, if the answer at step 163 is yes, step 165 
is accessed where a check is made to see if the 
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requesting port has its reserve bit set. If the answer 
is true, the corresponding lock request is denied at 
step 164. If the reserve bit for the requesting port 
is not set, the reserve bit for that port is set to the 

5 lowest reserve priority at step 166 and. subse- 
quently, step 164 is accessed and the lock request 
is denied. At the same time, the procedure for 
updating the reserve list is employed. 

Turning now to FIG. 5, there is shown a flow 

70 chart illustrating the procedure involved in updating 
the reserve list. As evident from the above discus- 
sion, the procedure X (designated generally as 
170) is accessed anytime that a requesting porfs 
reserve bit is to be set. At step 171, the initial value 

75 Temp 1 assigned to reserve bit R1 is tested to see 
if it is equal to 0. If the answer is yes, it is an 
indication that no other entries have previously 
been made in the reserve list and. accordingly, 
merely setting the reserve bit having the lowest 

20 priority, i.e.. R1, for the requesting port is sufficient 
and the updating procedure comes to a halt at step 
172. 

However, if the answer at step 1 71 reveals that 
Temp 1 has a value not equal to 0, it is an 

25 indication that one or more requests have been 
placed on reserve on the reserve list. Subsequent- 
ly, step 173 is accessed where the value Temp 1 
in reserve bit R1 is shifted into the reserve bit R2. 
At step 174. the value Temp 2 initially loaded into 

30 the reserve bit R2 is tested to see if it is equal to 0. 
An affirmative answer at Step 174 again leads to 
the completion of the updating procedure at step 
172. If the value Temp 2 is not equal to 0. step 175 
is accessed where the value Temp 2 is shifted into 

35 the reserve bit R3 and, subsequently, the updating 
is completed. The above procedure insures that 
memory requests are placed on the reserve list in 
sequential order so that each time a new request is 
added to the list, the priority of previously reserved 

40 requests is increased by one level. Thus, the oldest 
outstanding request on reserve is always arbitrated 
for grant of a lock request. 

According to a further aspect of this invention, 
the illustrative scheme described above with refor- 
ms ence to FIGS. 3-5 may conveniently be adapted to 
provide means for overriding the reserve priority 
defined by the sequential storage of denied lock 
requests on the reserve list if needed to insure that 
a requesting CPU is not kept waiting on the reserve 

50 list for an extended period of time. Despite the 
fairness built into the scheme described above, it is 
possible for a requesting unit to remain unserviced 
as it moves upward along the reserve list; this is 
particularly of concern with certain lock requests 

55 from I/O units, which can take a prolonged period 
of time for servicing memory access requests once 
an associated lock has been granted. Hence, it is 
possible for a reserved CPU port to "time out" or 
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crash if its built-in maximum period of waiting ex- 
pires while the CPU is on the reserve list. Under 
these conditions, the lock request synchronizing 
scheme can be adapted to set up a flag while the 
reserve list is being updated, indicating that a re- 
questing port is to be bumped up to the top of the 
reserve list provided certain predefined conditions 
exist. 

:ln FIG. 4, for instance, if the test at step 154 
indicates that at least one reserve bit has been set. 
the sequential procedure can be routed to routine 
"A" as shown in FIG. 6A, routine "A" includes a 
test made at step 180 to see if (1) the requesting 
port is a CPU, (2) the port has the second highest 
reserve priority, and (3) the pert having the highest 
priority at the time is a CPU. These conditions are 
defined on the basis that a newly reserved port is 
not to be bumped up to a higher priority and an I/O 
port having the outstanding highest priority is not to 
be displaced therefrom. If these conditions are 
satisfied, step 181 is accessed where a warning bit 
is set before returning to step 157 of the procedure 
shown in FIG. 4. Subsequently. If it is found at step 
162 that the requesting port does not have the 
highest priority routine, "B" is accessed. As shown 
in FIG. B. routine "B" starts at step 182 where, a 
check is made to see if the warning bit is set. If the 
answer is yes, it is an indication that the requesting 
port may be close to "crashing" and, hence, the 
reserve priority of the port is instantaneously 
bumped up (at step 183) and the requested lock is 
granted after clearing the highest reserve list. If 
step 182 reveals that no warning bit has been set, 
the lock is denied at step 161 (FIG. 4). It will be 
obvious that the routine "A" and "B" OF FIGS. 6A 
and 68 need not be executed if there is no reason 
to monitor the possibility of a CPU timing out. 



Claims 

1. A method of processing locked memory 
access requests in a multiprocessor computer sys- 
tem comprising: storing a directory of lock bits, 
each lock bit corresponding to a predefined seg- 
ment of memory locations, the status of each of the 
lock bits representing whether or not the corre- 
sponding memory segment is locked; ascertaining, 
for each lock request being processed, whether or 
not the lock bit corresponding to the memory seg- 
ment containing the memory location being ad- 
dressed by the request is set; and granting the 
request if the corresponding lock bit is not set, and 
denying the request if the corresponding lock bit is 
set. 

2. A method of processing locked memory 
access requests as claimed in Claim 1 in which all 
requests for access to memory locations within the 



same segment are mapped to the same lock bit in 
the directory. 

3. A method of processing locked memory 
access requests as claimed in Claim 1 or Claim 2 

5 in which each request which is denied is stored on 
a reserve list, unless the list is full or the request 
has already been so stored, in sequential order. 

4. A method of processing locked memory 
access requests as claimed in Claim 3 in which the 

10 oldest request on the reserve list is granted when 
the lock bit which originally resulted in the said 
request being denied is substantially found to be 
unset. 

5. A multiprocessor computer system including 
/5 a system control unit (SCU) (14) for operating a 

plurality of central processor units (CPUs) (12) in a 
parallel fashion with at least one input/output (I/O) 
interface (20), the system being characterised by 
means (100) for storing within the SCU (14) a 

20 directory of lock bits, each lock bit corresponding 
to a predefined segment of memory locations, the 
status of each of the lock bits representing whether 
or not the corresponding memory segment is 
locked; means for determining, for each lock re- 

25 quest being processed, whether or not the lock bit 
corresponding to the memory segment containing 
the memory location being addressed by the re- 
quest is set; and means for granting the request if 
the corresponding lock bit is not set and for de- 

30 nying the request if the corresponding lock bit is 
set. 

6. A multiprocessor computer system as 
claimed in Claim 5 in which each of the lock bits in 
the lock directory corresponds to a block of mem- 

35 ory defined by a CPU cache, all memory ad- 
dresses which are associated with a memory ac- 
cess request and which fall within a particular 
cache block being mapped to the same lock bit. 

7. A multiprocessor computer system as 
40 claimed in any one of Claims 5 to 6 including 

means for defining a reserve list of denied re- 
quests, and means for storing each request which 
is denied on the reserve list, unless the list is full or 
the request has already been so stored, in sequen- 
45 tial order. 

8. A multiprocessor computer system as 
claimed in Claim 7 including means for granting 
the oldest request on the reserve list when the lock 
bit which originally resulted in the said request 

50 being denied is substantially found to be unset. 

9. A multiprocessor computer system as 
claimed in Claim 7 or Claim 8 including overide 
means adapted to advance a request in the reserve 
list, other than the oldest request, when an overide 

55 condition exists for the said request. 
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@ A synchronising scheme for processing "locked" 
memory access requests from various computer 
system units is provided by a System Control Unit 
(SCU) (14). The SCU is provided with a directory of 
lock bits in such a way that each memory address 
associated with a memory access request is 
mapped to a corresponding location in the lock 
directory. Incoming requests for a given memory 
location are processed by interrogating the corre- 
sponding lock bit in the lock directory. If the lock bit 
is not set, the request is granted. The lock bit is 




subsequently set and maintained in that state until 
the unit requesting the lock has completed its mem- 
ory access operation and sends an "unlock" re- 
quest. If the interrogated lock bit is found to be set. 
the lock request is denied and the requesting port is 
notified of the denial and requested to try again. 
Fairness is incorporated in the processing of denied 
lock requests by means of a reserve list onto which 
denied requests are sequentially positioned on a 
first-come-first-served basis. 
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